传统上,文本聚类方法包含在多文件摘要(MDS)中作为一种用于应对相当大的信息重复的手段。集群被利用以表明信息显着性并避免冗余。这些方法集中在聚类句子上,即使密切相关的句子也通常包含非对齐信息。在这项工作中,我们重新审视聚类方法,将命题分组为更精确的信息对齐。具体而言,我们的方法检测到突出的命题,将它们聚集到释义集群中,并通过融合其命题来为每个集群生成代表性句子。我们的摘要方法在自动胭脂评分和人类偏好中,通过了在DUC 2004和TAC 2011数据集中的先前最先进的MDS方法。
translated by 谷歌翻译
Classical reinforcement learning (RL) techniques are generally concerned with the design of decision-making policies driven by the maximisation of the expected outcome. Nevertheless, this approach does not take into consideration the potential risk associated with the actions taken, which may be critical in certain applications. To address that issue, the present research work introduces a novel methodology based on distributional RL to derive sequential decision-making policies that are sensitive to the risk, the latter being modelled by the tail of the return probability distribution. The core idea is to replace the $Q$ function generally standing at the core of learning schemes in RL by another function taking into account both the expected return and the risk. Named the risk-based utility function $U$, it can be extracted from the random return distribution $Z$ naturally learnt by any distributional RL algorithm. This enables to span the complete potential trade-off between risk minimisation and expected return maximisation, in contrast to fully risk-averse methodologies. Fundamentally, this research yields a truly practical and accessible solution for learning risk-sensitive policies with minimal modification to the distributional RL algorithm, and with an emphasis on the interpretability of the resulting decision-making process.
translated by 谷歌翻译
For applications that require processing large amounts of text at inference time, Large Language Models (LLMs) are handicapped by their limited context windows, which are typically 2048 tokens. In-context learning, an emergent phenomenon in LLMs in sizes above a certain parameter threshold, constitutes one significant example because it can only leverage training examples that fit into the context window. Existing efforts to address the context window limitation involve training specialized architectures, which tend to be smaller than the sizes in which in-context learning manifests due to the memory footprint of processing long texts. We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training. The key to the approach is to carve a long context into chunks (``windows'') that fit within the architecture, restrict the attention mechanism to apply only within each window, and re-use the positional embeddings among the windows. We test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters, and show substantial improvements for tasks with diverse input and output spaces. Our results motivate further investigation of Parallel Context Windows as a method for applying off-the-shelf LLMs in other settings that require long text sequences.
translated by 谷歌翻译
Dual encoders are now the dominant architecture for dense retrieval. Yet, we have little understanding of how they represent text, and why this leads to good performance. In this work, we shed light on this question via distributions over the vocabulary. We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space. We show that the resulting distributions over vocabulary tokens are intuitive and contain rich semantic information. We find that this view can explain some of the failure cases of dense retrievers. For example, the inability of models to handle tail entities can be explained via a tendency of the token distributions to forget some of the tokens of those entities. We leverage this insight and propose a simple way to enrich query and passage representations with lexical information at inference time, and show that this significantly improves performance compared to the original model in out-of-domain settings.
translated by 谷歌翻译
Deep Neural Networks (DNN) are becoming increasingly more important in assisted and automated driving. Using such entities which are obtained using machine learning is inevitable: tasks such as recognizing traffic signs cannot be developed reasonably using traditional software development methods. DNN however do have the problem that they are mostly black boxes and therefore hard to understand and debug. One particular problem is that they are prone to hidden backdoors. This means that the DNN misclassifies its input, because it considers properties that should not be decisive for the output. Backdoors may either be introduced by malicious attackers or by inappropriate training. In any case, detecting and removing them is important in the automotive area, as they might lead to safety violations with potentially severe consequences. In this paper, we introduce a novel method to remove backdoors. Our method works for both intentional as well as unintentional backdoors. We also do not require prior knowledge about the shape or distribution of backdoors. Experimental evidence shows that our method performs well on several medium-sized examples.
translated by 谷歌翻译
In this paper, we identify the best learning scenario to train a team of agents to compete against multiple possible strategies of opposing teams. We evaluate cooperative value-based methods in a mixed cooperative-competitive environment. We restrict ourselves to the case of a symmetric, partially observable, two-team Markov game. We selected three training methods based on the centralised training and decentralised execution (CTDE) paradigm: QMIX, MAVEN and QVMix. For each method, we considered three learning scenarios differentiated by the variety of team policies encountered during training. For our experiments, we modified the StarCraft Multi-Agent Challenge environment to create competitive environments where both teams could learn and compete simultaneously. Our results suggest that training against multiple evolving strategies achieves the best results when, for scoring their performances, teams are faced with several strategies.
translated by 谷歌翻译
We introduce a new benchmark dataset, Placenta, for node classification in an underexplored domain: predicting microanatomical tissue structures from cell graphs in placenta histology whole slide images. This problem is uniquely challenging for graph learning for a few reasons. Cell graphs are large (>1 million nodes per image), node features are varied (64-dimensions of 11 types of cells), class labels are imbalanced (9 classes ranging from 0.21% of the data to 40.0%), and cellular communities cluster into heterogeneously distributed tissues of widely varying sizes (from 11 nodes to 44,671 nodes for a single structure). Here, we release a dataset consisting of two cell graphs from two placenta histology images totalling 2,395,747 nodes, 799,745 of which have ground truth labels. We present inductive benchmark results for 7 scalable models and show how the unique qualities of cell graphs can help drive the development of novel graph neural network architectures.
translated by 谷歌翻译
扩散模型是一类生成模型,与其他生成模型相比,在自然图像数据集训练时,在创建逼真的图像时表现出了出色的性能。我们引入了Dispr,这是一个基于扩散的模型,用于解决从二维(2D)单细胞显微镜图像预测三维(3D)细胞形状的反问题。使用2D显微镜图像作为先验,因此可以根据预测现实的3D形状重建条件。为了在基于功能的单细胞分类任务中展示DIPPR作为数据增强工具的适用性,我们从分组为六个高度不平衡类的单元中提取形态特征。将DISPR预测的功能添加到三个少数类别,将宏F1分数从$ f1_ \ text {macro} = 55.2 \ pm 4.6 \%$ to $ f1_ \%$ to $ f1_ \ text {macro} = 72.2 \ pm 4.9 \%$。由于我们的方法是在这种情况下第一个采用基于扩散的模型的方法,因此我们证明了扩散模型可以应用于3D中的反问题,并且他们学会了从2D显微镜图像中重建具有现实的形态特征的3D形状。
translated by 谷歌翻译
从大脑活动中解码语言是医疗保健和神经科学中期待已久的目标。由于颅内设备,最近已经达到了主要里程碑:对基本语言任务的侵入性大脑反应训练的主题特定管道现在开始有效地解释可解释的功能(例如字母,单词,频谱图)。但是,将这种方法扩展到自然语音和非侵入性脑记录仍然是一个主要挑战。在这里,我们提出了一个端到端的架构,该体系结构在大量个体中进行了对比学习,以预测自然语音的自我监督的表现。我们在四个公共数据集上评估了我们的模型,其中包括169名用磁性或电脑图(M/EEG)记录的志愿者,同时他们听了自然的语音。结果表明,我们的模型可以从3s MEG信号中识别出相应的语音段,其中1,594个不同的段中最高72.5%的前10个精度(和44%的TOP-1准确性),最多可在19.1%中获得19.1%。脑电图记录的2,604个细分市场 - 因此允许训练集中不存在短语。模型比较和消融分析表明,这些性能直接从我们的原始设计选择中受益,即(i)对比目标,(ii)语音的预估计表示和(iii)在几个参与者中同时培训的常见卷积架构。这些结果共同描述了一个有希望的途径,可以从无创的大脑活动记录中实时解码自然语言处理。
translated by 谷歌翻译
用于生存预测的深层神经网络在歧视方面超过了经典方法,这是患者根据事件的秩序。相反,诸如COX比例危害模型之类的经典方法显示出更好的校准,即对基础分布事件的正确时间预测。特别是在医学领域,预测单个患者的存活至关重要,歧视和校准都是重要的绩效指标。在这里,我们提出了离散的校准生存(DC),这是一个新型的深层神经网络,用于歧视和校准的生存预测,在三个医疗数据集的歧视中优于竞争生存模型,同时在所有离散时间模型中实现最佳校准。 DC的增强性能可以归因于两个新型功能,即可变的时间输出节点间距和新颖的损耗项,可优化未经审查和审查的患者数据的使用。我们认为,DCS是临床应用基于深度学习的生存预测和良好校准的重要一步。
translated by 谷歌翻译